Source-Error Aware Phrase-Based Decoding for Robust Conversational Spoken Language Translation
نویسندگان
چکیده
Spoken language translation (SLT) systems typically follow a pipeline architecture, in which the best automatic speech recognition (ASR) hypothesis of an input utterance is fed into a statistical machine translation (SMT) system. Conversational speech often generates unrecoverable ASR errors owing to its rich vocabulary (e.g. out-of-vocabulary (OOV) named entities). In this paper, we study the possibility of alleviating the impact of unrecoverable ASR errors on translation performance by minimizing the contextual effects of incorrect source words in target hypotheses. Our approach is driven by locally-derived penalties applied to bilingual phrase pairs as well as target language model (LM) likelihoods in the vicinity of source errors. With oracle word error labels on an OOV word-rich English-toIraqi Arabic translation task, we show statistically significant relative improvements of 3.2% BLEU and 2.0% METEOR over an error-agnostic baseline SMT system. We then investigate the impact of imperfect source error labels on error-aware translation performance. Simulation experiments reveal that modest translation improvements are to be gained with this approach even when the source error labels are noisy.
منابع مشابه
Semi-Supervised Word Sense Disambiguation for Mixed-Initiative Conversational Spoken Language Translation
Lexical ambiguity can cause critical failure in conversational spoken language translation (CSLT) systems due to the wrong sense being presented in the target language. In this paper, we present a framework for improving translation of ambiguous source words that (a) constrains statistical machine translation (SMT) decoding with phrase pair clusters to select a desired sense for translation; (b...
متن کاملNiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...
متن کاملAugmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation
We propose a novel technique for adapting text-based statistical machine translation to deal with input from automatic speech recognition in spoken language translation tasks. We simulate likely misrecognition errors using only a source language pronunciation dictionary and language model (i.e., without an acoustic model), and use these to augment the phrase table of a standard MT system. The a...
متن کاملIBM spoken language translation system evaluation
We discuss phrase-based statistical machine translation performance enhancing techniques which have proven effective for Japanese-to-English and Chinese-toEnglish translation of BTEC corpus. We also address some issues that arise in conversational speech translation quality evaluations.
متن کاملLightly-Supervised Word Sense Translation Error Detection for an Interactive Conversational Spoken Language Translation System
Lexical ambiguity can lead to concept transfer failure in conversational spoken language translation (CSLT) systems. This paper presents a novel, classificationbased approach to accurately detecting word sense translation errors (WSTEs) of ambiguous source words. The approach requires minimal human annotation effort, and can be easily scaled to new language pairs and domains, with only a wordal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013